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ABSTRACT 

This invention is directed to a method and apparatus for providing low, predictable 
latencies in processing IP packets. The apparatus provides a specialized microprocessor 
5 or hardwired circuitry to process IP packets for video communications and control of the 
video source without an operating system. The method relates to operation of a 
microprocessor which is suitably arranged to carry out the steps of the method. The 
method includes details of operation of the specialized microprocessor. 
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Massively Reduced Instruction Set Processor 
FIELD OF INVENTION 

This invention relates in general to microprocessors, and in particular, to a 
5 microprocessor used for data communications. 

BACKGROUND OF THE INVENTION 

Over the last few decades Internet Protocol (IP) communications have become the 
dominant form of electronic communication. IP communications allow the use of a wide 
1 0 array of different protocols. To simplify data handling and routing, the protocols are 
arranged in a stack and the "low^est-level" protocols encapsulate the higher-level 
protocols. This encapsulation allows the idiosyncrasies of the higher level protocols to be 
hidden from the routing functions and further allows the partitioning of the analysis of 
the data. 

15 

In stand-alone devices, also known as embedded products and embedded devices, 
embedded computers are typically used to perform the encapsulation and de- 
encapsulation to send and receive the data respectively. An embedded computer is 
characterized as having a general purpose CPU, with associated memory. The computer 
20 runs an Operating System (OS), such as embedded Linux. The protocol processing is 
handled by the OS and application software is provided that runs on top of the OS to 
handle the communications functions and other tasks that are required. 

This architecture is analogous to what is provided on general purpose computers (PCs) 
25 and workstations. Using the same processes to handle the communications in the 
embedded device as are used on general purpose computers is natural since IP 
communications was first performed only on general purpose computers and later 
migrated to embedded devices. 

30 However, different from general puipose computers, embedded devices only have 

limited resources and are highly cost sensitive. The processor that can be employed in an 
embedded computer is often very limited in performance due to cost, space, and power 
consumption constraints. As a result an embedded device often cannot be cost effectively 
IP enabled for high-bandwidth devices. 
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To handle multiple tasks a real-time operating system (RTOS) is often employed which 
provides the abilities to respond to system requests in a very short period of time. Even 
with this, applications such as high performance image delivery for machine vision find 
5 the level of latency and the variation in the latency associated with the delivery of the 
video to be unacceptable. Further, when OS-based embedded devices are pushed to their 
hmits they can become unrehable with deadlocks that freeze the device. 

It is obvious that the above implementations do not address the requirements for protocol 
10 processing on a device, such as a high-speed electronic video camera or other high- 
bandwidth device. Therefore there is a need for a method and ^^paratus capable of 
processing IP packets with low, consistent latencies that are suitable for delivering video 
over an IP network. 

15 SUMMARY OF THE INVENTION 

This invention is directed to a method and apparatus for providing low, predictable 
latencies in processing IP packets. The apparatus provides a specialized microprocessor 
or hardwired circuitry to process IP packets for video communications and control of the 
video source without an operating system. The method relates to operation of a 
20 microprocessor which is suitably arranged to carry out the steps of the method. The 
method includes details of operation of the specialized microprocessor. 

In accordance with one aspect of this invention, a massively reduced instruction set 
processor (mRISP) is disclosed which is a tiny embedded soft processor tailored for 

25 processing communication protocols in accordance with the method disclosed herein. In 
a preferred embodiment, this processor has only two instructions and some optional 
registers performing basic functions, such as arithmetic and logical functions, and 
specialized functions like Program Counter. Timers, IP Checksum and DMA. The soft 
implementation of the niRISP is realized since it is fully configurable upon construction 

30 through synthesis of a register transfer level (RTL) representation of the design by 

specifying the registers and the features required in the implementation. The processor 
that is created from the syntliesis is tailored for a specialized task, such as data 
communications. 

40196514.1 
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The two IPP instructions are LOAD and MOVE which are the minimal instructions 
necessary for a processor. Some macros are built over these two instructions in 

conjunction with registers to add some other basic functionality like JMP, CALL and 
5 RET. The macros are used in the compiler for the instruction set for the IPP, and are 
built solely using the LOAD and MOVE instructions. 

The core is maximally optimized for a 16-bit data bus and a 32'bit instructions bus, 
although it can be configured for wider or narrower bus widths. In 1 6-bit data mode, 
10 bytes can be swapped for single byte access and operation. The 32-bit instructions bus, 
separated from the data bus, allows the timing to be reduced to only one clock cycle for a 
LOAD and two clock cycles for a MOVE. An extra clock cycle is added to the timing on 
a jump in the program counter. 

15 For slow external memory fetching or for any oUier specific reasons, external logic can 
be added to control the HOLD input signal and holds the processor for a required number 
of clock cycles. In addition to that, specialized waiting functions, if required and 
activated, can hold the processor until an expected event occurs. 

20 With such a processor, IP packets can be processed at significantly higher rates, with 
lower, consistent latencies, than can be accomplished using a general purpose 
microprocessor 

BRIEF DESCRIPTION OF THE DRAWINGS 

25 

Figure 1 is the preferred INVENTION Embodiment. 
Figure 2 is a description of the mRISP State Machine 
Figure 3 is a description of the Data Path 
Figure 4 is a description of the checksum register 
30 Figure 5 is a description of the MOVE register 
Figure 6 is a description the LOAD register 
Figure 7 is a description of the General Purpose Register A 



40196514.1 



CA 02443347 2003-09-29 



Figure 8 is a description of the General Purpose Register B 
Figure 9 is a description of Uie Program Counter 
Figure 10 is a description of the Return Register 
Figure 1 1 is a description of the Mask Register 
5 Figure 12 is a description of the Wait Register 
Figure 13 is a description of the Timer 0 Register 
Figure 14 is a description of the Timer 1 Register 
Figure 15 is a description of the Checksum Register 
Figure 16 is a description of the DMA. Register 
10 Figure 17 is a description of jump and the call conditions when writing in the Program 
Counter. 

Figure 18 is a description of a possible set of macros that could be used. 

Figure 19 is a cycle by cycle representation of the mRISP State Machine with different 

cases. 

15 Figure 20 is an implementation of the Event Block 

Figure 21 is an implementation of the instruction formatter with the opcode decoder, 

address detector and byte swapping detector. 

Figure 22 is a description of the Internal Registers and Functions 

Figure X is a representation of the current invention which includes mathematical and 
20 logical operations in the processor. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

In the following detailed description of the embodiments, reference is made to the 
25 accompanying drawings, which form a part hereof, and in which is shown by way of 
illustration specific embodiments in which the invention may be practiced. These 
embodiments are described in sufficient detail to enable those skilled in the art to 
practice the invention, and it is to be understood that other embodiments may be utilized 
and that structural, logical and electrical changes may be made without departing from 
30 the spirit and scope of the present inventions. The following detailed description is, 
therefore, not to be taken in a limiting sense, and the scope of the present inventions is 
defined only by the appended claims. 
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The mRTSP implements the CPU with separate data and program memory bus, generally 
known as a Harvard memory architecture. The mRlSP program memory bus is a 32-bit 
wide used to fetch instructions from memory. Tlie mRISP data bus is a 16-bit wide used 
5 to move 16-bit word data from user memory, intemal registers or program memory to 
user memory and intemal registers. The external user memory bus may be connected to 
memories or peripherals. 

Instruction Set 

10 

The mRISP instruction set is massively reduced to only two instructions. The first one is 
the instruction MOVE which one moves data from a source address to a destination 
address. The only other one necessary for a functional CPU is the instruction LOAD 
which one can initialize memory and registers to a proper value from the program 
15 memory. 

The 32-bit instruction contains only one bit to decode the opcode. On an instruction 
MOVE, 14-bit is dedicated for the source address and another 14-bit is for the 
destination address^ leaving 3 bits unused. On an instruction LOAD, 14-bits are used for 
20 the destination address and 1 6-bits for the constant word to load, leaving 1 bit unused. 

The MSB bit of the addresses (source and destination) is used to select between the 
external user memory region and the intemal registers region. The LSB bit of the 
addresses (source and destination) is used to decode if data bytes swapping has to be 
25 done. Thus 12 bits out of 14 bits are available to user memory and peripheral. The 
external memory address is in word (16-bit). 

Figuies 5 and 6 provide a bit by bit description of the MOVE and LOAD instructions. 

30 Figxu'c 2 1 provides an implementation of the instruction formatter with the opcode 
decoder, address detector and byte swapping detector. 

Intemal Regis ters and Fu nction s 
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The mRISP has two general-purpose registers (REG_A and REG_B) and some dedicated 
registers performing specific fimctions (i.e. features). Those functions maybe any 
combination or number of arithmetic (increment, adder), logical (AND, OR, XOR), 
5 comparators, timers, DMA, interrupts and program counter functions. The arithmetic and 
logical registers use the general-puipose registers with the mask register as inputs. Their 
values are constantly updated as general-purpose registers change. 

Two registers are specificaU)' designed to process Internet Protocols. The first one 
10 (CSUM) is useful to compute Internet Protocols checksums. The method used to 

compute the IP checksum is the 16-bit one's complement sum of the corresponding data. 
Each time a write is done into the CSUM register, the 16-bit one's complement addition 
is computed from the previous value and the written value. When all the data to be 
included in the checksum has been written in this register, the read of this register gives 
15 the 1 6-bit one's complement sum by inverting the pi*esent value. A read resets the CSUM 
register to zero, ready for another computation. By filling the checksum field(s) in the IP 
header(s) with a magic number> the checksum can be serially performed as the data is 
being packetized. One byte is added at the end of the packet with the appropriate data 
necessary to make the magic number in the header field correct. 

20 

The second register (DMA) is used to move multiple data fi-om one location to another 
one within three instructions. When one location is an internal register, its address is not 
incremented, enabling the capability to send consecutive data in memory into one special 
register or initialize consecutive data in memory with one register's value. In conjunction 
25 with CSUM, it is easy to quickly compute Internet Protocols checksums with only a few 
instructions. 

Comparators between PvEG_A and RJEG_B aie constantly computed. Two flags are 
necessary to do all comparison (equal '=\ not equal less than greater than 
30 less than or equal and greater than or equal '>=')• The first one's is the "A Equal B** 
flag (eq) and the second one's is the "A Greater than B" flag (gt). Those flags are used in 
conjunction with the Program Counter (PCNT) to enable conditional jumps. The 
descriptions of the General Purpose registers, Program Counter, Return Register, Mask 

40196514.1 
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Register, Wait Register, Timer 0 Register, Timer I Register, Checksum Register and 
DMA Register are provided in Figures 7-15 respectively. 

Figure 22 provides a description of the Internal Registers and Functions. 

5 

Program Counter and Return Registers 

The Program Counter register (PCNT) is cleared to zero on reset and is incremented by 
one on last cycle of every instniction (when prd is high). It always points to the next 
10 instruction during the processing of the current instruction, A jump in the program 
memory can be accomplished by writing the new instruction's address in the Program 
Counter register. The jump can be conditional or not, depending on the state of the 
comparator flags {eq and gt) and the setting of the three flag bits (IE, IG and IN) in the 
Program Counter Register. 

15 

A CALL instruction can be accomphshed by writing in the Program Coxmter register the 
sub-routine's address and by setting the flags to IE=0, IG=0 and IN=1 . In this case, the 
Return register (RET A) loads tlie Program Counter's value at the same time the jump is 
done. Later, on a RET instmction (by moving RETA's value into PCNT regis-ter), the 
20 mRISP can resume fetching instructions on the next one's after the CALL instructioiL 
The stack is hardware and its depth is configurable at the synthesis. The stack is 
structured as a LIFO (Last In First Out). On a CALL instruction, the Program Counter's 
value is pushed in the LIFO and on a RET instruction, the value to write into the 
Program Counter is pulled from the LIFO. 

25 

Figure 16 summarizes the jump and the call conditions when writing in the Program 
Counter. 

Event Handlin g 

30 

The mRISP allows up to 16 events, which can be generated from any of the two sources: 
extemal hardware interrupts or internal events. The internal events may come from 
timers, real-time timer and watchdog logic. All events are completely handled by 
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software and no event can inteirupt the execution of the program. The software must 
verify itself in the WAIT register if an event occurred. The software can put the 
processor in the sleep mode by setting in the WAIT register the bit(s) of the 
corresponding event(s) it want to be waked up. 

5 

According to Figure 12, writing one in the event 'X' bit of the WAIT register, sets to one 
the conesponding ''SET" signal (wcdtjcjset) and, at the same time, sets to one the global 
signal wait. Then the processor goes in the sleep mode and waits for the event X. 

10 When this event occurs {event jc goes to one), the corresponding "EVENT" signal 
{waitjc_evt) is set to one. One clock cycle later, this signal clears the SET signal 
(waitjc set) and the global signal wait. Thus the processor resumes its operations. 

The software has the responsibility to clear the EVENT bit and to retrieve which event 
1 5 waked up the processor if more than one bit has been set iii the wait register. By reading 
the WAIT register, the software reads all the EVENT bits {wait_?jsvt) and also clears 
most of die bits (timer event bits axe only clear by writing in the corresponding TIMER 
register). 

20 Figure 20 provides an implementation of the Event Block. 
Macros 

Macros are added to instnictions that are interpreted by the compiler. These make the 
25 mRISP easier to program and makes the resulting assembly code more understandable 
and maintainable. These are built over the two instructions in conjunction with registers. 
For example the JMP macro, which one is used to jump in another part of the program, is 
in fact a LOAD instruction with the destination address equals to the Program Counter 
register's address and the constant data equals to the address to jump in the program 
30 memory. 

Figure 1 8 provides a possible set of macros that could be used. 
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Data Path 

For each instruction, a 16-bit data word is transferred from one location to another one. 
The source may be from the program memory (on a LOAD), from one of the internal 
5 registers or from the user memory (on a MOVE), The destination may be either one of 
the internal registers or the user memory. 

The higher byte and the lower byte in the data may be swapped together when only one 
of the location address is odd (bit 0 is high). 'This is very useful to reverse the byte 
10 ordering since Internet Protocols are big-endian and the mRISP is little-endian. 

Figure 3 describes the data path. 

State Machine 

15 

The mRISP state machine synchronizes internal and external control signals to provide 
efficient timing. The LOAD instruction takes only one clock cycle and two clock cycles 
for a MOVE. An extra clock cycle is added to the timing on a jump in the program 
counter. 

20 

Figure 2 provides a diagram of the niRJSP State Machine. The State Machine has only 
four states. The RESET state is reached whenever the signal rst_n is asserted. At the 
first cycle where rstji is de-asserted, the state machine goes to the FETCH32 state. 

25 The FETCH32 state decodes the instruction presented on the pdatajn bus. Depending 
on the value of the opcode and the signals hold and wait, the next state can be 
WAIT^ACK, JUMP or FETCH32 again. The signal wait is used in this state to keep the 
processor waiting for an event, defined previously by writing in the WAIT register. 
During this wailing, no instruction fetching, no writes and no reads are performed. In the 

30 FETC.H32 state, the signal hold has the same effect as the signal wait but it is generated 
by external logic. The reason for its assertion may be that data fi*om the program memory 
is not ready due to slow memory, that the write fix)m the previous instruction into 
external memories takes more than one clock cycle or for any other reasons. If the 
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signals wait and hold are not asserted and the opcode is MOVE, a read is performed from 
the source address and the next state is WAIT_ACK. Otherwise the instruction LOAD is 
performed. The constant data contained in the instruction is written to the destination 
address. If the destination address is ttie Program Counter Register (PCNT) and the flag 
5 indicates an unconditional jump or a true conditional jump (signal Jump is asserted), the 
State Machine goes in the JUMP slate. Otherwise, it stays in the same state, ready for the 
next instruction. 

The WAIT ACK state avails for the read data from the source address to be ready. If it's 
10 not, the external logic must keep the signal hold asserted until data is ready. When it is 
ready, the State Machine comes back m the FETCH32 state unless the destination 
address of the MOVE instruction was the Program Counter Register (PCNT) and the flag 
indicated an unconditional jump or a true conditional jiunp (signal jump is asserted). Tn 
this last case, the next state is going to be JUMP. 

15 

The JUMP state is an idle state where the cycle is used only to fetch the instruction 
pointed by the new address loaded in the Program Counter Register. 
The State Machine comes back in tho FECTH32 state unless the external logic keqps the 
signal hold asserted for any reason. 

20 

A cycle by cycle representation of the State Machine with different cases is provided in 
Figure 19. 

It is to be understood that this description is intended to be illustrative, and not 
25 restrictive. Many other embodiments will be apparent to those of skill in the art upon 
reviewing the above description. The scope of the invention should, therefore, be 
determined with reference to the appended claims, along with the full scope of 
equivalents to which such claims are entitled. 

30 The embodiment(s) of the invention described above is (are) intended to be exemplary 
only. The scope of the invention is therefore intended to be hmited solely by the scope 
of the appended claims. 

40196514.1 
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CLAIMS 



5 1. Apparatus providing a specialized microprocessor or hardwired circuitry to 
process IP packets for video communications and control of a video source. 

2. A method for operation of a specialized microprocessor or hardwired circuitry to 
process IP packets for video communications and control of a video source. 
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Figure 1 
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Figure 2 
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Figure 20 
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Figure 22 
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